Rough Set based Rule Induction Package for R

نویسندگان

  • Shusaku Tsumoto
  • Shoji Hirano
چکیده

Rough set theory is a framework of dealing with uncertainty based on computation of equivalence relations/clases. Since a proability is defined as a measure of sample space, defined by equivalence classes, rough sets are closely related with probabilities in the deep level of mathematics. Furthermore, since rough sets are closely related with Demster-Shafer theory or fuzzy sets, this theory can be viewed as a bridge between classical probability and such subjective probabilities. Also, this theory is closely related with Baysian theories. The application of this theory includes feature selection, rule induction, categorization of numerical variables, which can be viewed as a method for categorical data analysis. Especially, rough sets have been widely used in data mining as a tool for feature selection, extracting rules (if–then rules) from data. Also, this theory includes a method for visualization, called “flow graphs.” This paper introduces a rough set based rule induction package for R, including: (1) Feature selection: rough sets call a set of independent variables “reducts.” This calculation is based on comparisons between equivalence classes represented by variables with respect to the degree of independence. (2) Rule Induction: rough sets provide a rule extraction algorithm based on reducts. Rules obtained from this subpackage are if–then rules. (3) Discretization (Categorization of Numerical Variables): discretization can be viewed as a sequential comparison between equivalence classes given in a dataset. (4) Rough Clustering: calculation of similarity measures can be also viewed as that of comparisons between equivalence classes. Rough clustering method gives a indiscernbilitybased clustering with iterative refinement of equivalence relations. (5) Flow Graph: this subpackage visualizes a network structure of relations between given variables. Unlike bayesian networks, not only conditional probabilies but also other subjective measures are attached to each edge. (6) Rule Visualization with MDS: this subpackage gives a visualization approach to show the similar relations between rules based on multidimensional scaling. The usage of R gives the following advantages: (a) Rough set methods can be easily achieved by fundamental R-functions, (b) Combination of rough set methods and statistical packages are easily acheived by rich R-packages. In the conference, several aspects of this package and experimental results will be presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementing algorithms of rough set theory and fuzzy rough set theory in the R package "RoughSets"

The package RoughSets, written mainly in the R language, provides implementations of methods from the rough set theory (RST) and fuzzy rough set theory (FRST) for data modeling and analysis. It considers not only fundamental concepts (e.g., indiscernibility relations, lower/upper approximations, etc.), but also their applications in many tasks: discretization, feature selection, instance select...

متن کامل

Application of R to Data Mining in Hospital Information Systems

Hospital information systems have been introduced since 1980’s and have stored a large amount of data of laboratory examinations. Thus, the reuse of such stored data becomes a critical issue in medicine, which may play an important role in medical decision support. On the other hand, data mining from the computer science side emerged in early 1990’s, which can be viewed a re-invention of what s...

متن کامل

Rough Sets in Bioinformatics

Rough set-based rule induction allows easily interpretable descriptions of complex biological systems. Here, we review a number of applications of rough sets to problems in bioinformatics, including cancer classification, gene and protein function prediction, gene regulation, protein-drug interaction and drug resistance.

متن کامل

VC-DomLEM: Rule induction algorithm for variable consistency rough set approaches

We present a general rule induction algorithm based on sequential covering, suitable for variable consistency rough set approaches. This algorithm, called VC-DomLEM, can be used for both ordered and non-ordered data. In the case of ordered data, the rough set model employs dominance relation, and in the case of non-ordered data, it employs indiscernibility relation. VC-DomLEM generates a minima...

متن کامل

A Comparative Study on Strategies of Rule Induction for Incomplete Data Based on Rough Set Approach

Rough set based rule induction approaches have been studied intensively during past few years. However, classical rough set model cannot deal with incomplete data sets. There are two main categories dealing with this problem: the preprocessing methods and the extensions of rough set model. This paper focuses on the comparison of three strategies for dealing with incomplete data containing three...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004